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Abstract 


We  present  parallel  algorithms  to  compute  the  deter¬ 
minant  and  characteristic  polynomial  of  n*n-matrices  and  the 
gcd  of  polynomials  of  degree  sn.  The  algorithms  use  parallel 
time  O(log  n)  and  a  polynomial  number  of  processors.  We  also 
give  a  fast  parallel  Monte  Carlo  algorithm  for  the  rank  of 
matrices.  All  algorithms  work  over  arbitrary  fields. 
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1 .  Introduction 

Today’s  technology  has  motivated  recent  activity 
concerning  parallel  programs.  Much  of  this  activity  has  focus¬ 
sed  on  combinatorial  questions  (sorting,  graph  theoretic 
algorithms  etc.)  and  on  questions  relating  to  the  parallel 
architecture  itself  (routing,  queuing  ' etc. )  .  Clearly  it  is 
also  of  recognized  importance  to  investigate  algebraic  ques¬ 
tions,  and  to  this  end  we  present  algorithms  for  some  basic 
problems  such  as  computing  the  determinant  and  the  rank  of 
matrices  or  the  gcd  of  polynomials. 

There  are  two  basically  different  approaches  to  what 
constitutes  a  "fast  parallel  algorithm".  One  is  to  start 
from  a  good  sequential  algorithm  and  try  to  parallelize  it 
with  a  near-optimal  speed-up,  i.e.  try  to  achieve  parallel  time 
close  to  (sequential  time)/#  processors.  The  second  approach 
is  to  attempt  to  make  the  parallel  time  as  small  as  possible, 
allowing  an  almost  arbitrary  (e.g.  polynomially  bounded) 
number  of  processors.  While  the  first  approach  seems  to  be 
appropriate  for  the  present  technology  where  in  effect  only 
a  rather  limited  amount  of  parallelism  is  available,  it  is 
not  unreasonable  to  expect  that  some  time  in  the  future  the 
"asymptotically  fast  algorithms"  of  the  second  approach 
will  play  an  important  role.  The  situation  is  not  unlike  the 
dual  approach  to  sequential  algorithms,  where  one  is 
interested  in  both  constant  speed-up  of  known  algorithms 
(say,  by  programming  optimization)  and  the  construction  of 
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asymptotically  fast  algorithms  (even  though  the  hidden  constants 
for  the  computing  time  may  be  large)  .  Perhaps  the  reader  has 
guessed  by  now  that  here  we  pursue  the  second  approach  to  parallel 
programming . 

In  this  paper  we  discuss  two  basic  problems:  solving 
linear  equations  and  simplifying  rational  expressions.  Both  have 
nice  sequential  solutions  -  Gaussian  elimination  and  Euclid's 
algorithm  -  and  it  is  an  intriguing  question  if  there  also  exist 
fast  parallel  methods.  While  Csanky  [76]  has  given  a  fast  deter¬ 
minant  algorithm  over  fields  of  characteristic  zero,  applications 
such  as  factoring  polynomials  require  an  algorithm  that  works  over 
arbitrary  fields,  in  particular  finite  fields.  We  present  such 
an  algorithm  below,  based  on  the  general  parallelization  result 
by  Valiant-Skyum-Berkowitz-Rackoff  [81]. 

As  direct  corollaries  we  get  fast  methods  for  inverting 
matrices  and  solving  nonsingular  systems  of  linear  equations. 

Further  applications  are  the  characteristic  polynomial  of  a  matrix 
and  the  ged  of  polynomials. 

Some  interesting  combinatorial  problems  -  maximal  matchings, 
maximal  flow  -  translate  into  the  problem  of  computing  the  rank 
of  matrices.  That  seems  to  be  a  slightly  more  difficult  question. 

We  present  a  fast  parallel  Monte  Carlo  method  that  either  returns 
the  rank  of  the  input  matrix  or  reports  that  it  ^failed;  the  latter 
with  small  probability.  Applications  include  finding  a  basis  for 
the  nullspace  of  a  matrix,  finding  a  maximal  linearly  independent 
subset  of  a  given  set  of  vectors,  and  the  solution  of  a  general 
(possibly  singular)  system  of  linear  equations,  all  this  again 
over  arbitrary  fields. 
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2.  The  model 

The  algorithms  described  in  this  paper  can  be  implemented 

on  a  synchronous  shared-memory  model  of  computation  such  as  the 

PRAM  (Fortune-Wyllie  [78]),  with  arithmetic  and  tests  in  F  as 

2 

basic  operations.  The  algorithms  all  use  O(log  n)  parallel 
time  and  n^1^  processors  when  n  is  the  number  of  inputs. 

In  particular,  it  follows  that  the  determinant  and  gcd  problems 
are  in  the  appropriate  analogue  of  uniform  NC  (Pippenger  [79]), 
and  the  rank  problem  is  in  the  Monte  Carlo  or  non-uniform 
analogue  of  NC. 

When  the  ground  field  F  is  Q  or  a  finite  field,  one  can 
represent  the  inputs  as  strings  over  a  finite  alphabet  and  ask 
for  a  (say)  PRAM  with  bit  instructions  solving  the  problem. 

Our  algorithms  show  that  the  determinant  and  gcd  problems  are  in 
the  corresponding  Boolean  class  NC,  and  the  rank  problem  is  in 
the  Monte  Carlo  or  non-uniform  Boolean  version  of  NC  (in  fact, 
for  F=Q  in  NC) .  The  essential  point  for  this  is  that  (for  F=Q) 
according  to  Edmonds  [67]  (see  also  Bareiss  [68])  the  intermediate 
values  in  Gaussian  elimination  are  reasonably  small. 

The  most  appropriate  model  for  our  algorithms  seems  to  be 
a  parallel  version  of  algebraic  computation  trees  (Strassen  [81]). 
Thus  at  each  node  a  number  of  operations/tests  can  be  performed; 
the  maximal  such  number  is  the  number  of  processors  of  the  parallel 
algebraic  computation  tree.  If  the  input  space  is  Fn  and  the 
computation  is  defined  at  all  inputs,  we  get  a  partition 
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*  I 


consisting  of  the  inputs  for  which  the  computation  terminates 

at  that  leaf.  On  each  S.,  a  sequence  f.  =  (f f. . )  of 

1  XIX  1 X* 

rational  functions  is  computed.  Such  an  object  (f,S)  with 
f=(fj, . . .  ,f^)  is  called  a  collection;  see  Strassen  [81]. 

The  fast  parallel  algorithms  that  we  seek  should  have 
three  properties:  small  parallel  time,  small  number  of  processors, 
small  size.  More  precisely,  the  parallel  time  should  be  poly¬ 
nomial  in  logn,  the  number  of  processors  and  the  number  of 
nodes  of  the  tree  should  be  polynomial  in  n  where  n  is  the 

number  of  inputs.  The  algorithms  that  we  present  have  these 

2 

properties,  in  particular  time  O(log  n) . 

The  basic  steppingstone  for  the  whole  theory  is  the 
result  by  Valiant-Skyum-Berkowitz-Rackoff  [81],  It  says  that 
any  sequential  program  computing  a  polynomial  of  degree  sn 
with  t  steps  can  be  converted  to  a  parallel  program  with  parallel 
time  O(logn(logn+logt) )  using  0(t3n6)  processors. 
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3.  Determinant  and  gcd 

In  this  section  we  discuss  the  following  problems: 

DETERMINANT (n)  (=»  computing  the  determinant  of  an  n*n-matrix) , 

CHARACTERISTIC  POLYNOMIAL (n) ,  NONSINGULAR  EQUATIONS  (n) 

(  =  computing  the  solution  of  a  nonsingular  n*n-system  of  linear 

equations),  INVERSION(n)  (=  computing  the  inverse  of  an  n*n- 

matrix,  if  it  is  nonsingular),  GCD(n)  (  =  computing  the  monic 

gcd(f,g),  where  f,g  e  F[x]  have  degree  s  n)  . 

The  collection  for  INVERSION(n)  consists  of  the  sets 

2  2  2 

S,  ,S-  c  Fn  ,  inhere  S.  =  {(a..)eFn  ;  det(a.  .)  =  0)  and  S~  =  Fn  \S,  , 
1*2-  *  1  1  K  2  1 

and  the  output  functions  are  f=0  on  S^  (signalling  that  the 
input  matrix  is  not  invertible)  and  f  =  (f ^ , . . . , fnn)  with 
f . .  e  F(x, x  )  the  (i,j) -entry  of  the  inverse  of  the  n*n- 
matrix  (x..).  The  collections  for  the  first  three  problems  are 
similarly  obvious,  and  one  for  GCD  can  be  found  in  Strassen  [81]. 
IVe  write,  e.g.  INVERSION  for  the  sequence  (INVERSION(n) ) 

The  following  result  was  proved  by  Csanky  [76],  but 
only  for  fields  of  characteristic  zero. 

Theorem  1  For  any  field  F,  DETERMINANT,  INVERSION,  NONSINGULAR 

EQUATIONS,  CHARACTERISTIC  POLYNOMIAL  can  be  computed  in  parallel 
2 

time  O(log  n)  using  a  polynomially  bounded  number  of  processors. 
Branching  does  not  occur,  and  for  DETERMINANT  and  CHARACTERISTIC 
POLYNOMIAL  no  divisions  are  necessary. 

Proof  We  consider  ordinary  Gaussian  elimination  performed  on  an 

n*n-matrix  X  =  (x.  .)  of  indeterminates ,  with  pivots  chosen  on  the 

x  j 

diagonal.  This  yields  a  (sequential)  computation  of  det  X  in 


6 


time  O(n^)  (Coppersmith-Winograd  [81]:  0(n^‘^b)).  When  we 

execute  this  algorithm  on  the  n*n-identity  matrix,  the  only 

divisions  that  occur  are  by  1.  According  to  Strassen  [73]  we 

can  shift  the  indeterminates  x. .  by  6..  and  obtain  an  algorithm 

il  7  iJ  6 

c 

for  det  X  without  division  using  time  0(n  )  .  We  can  now  apply 

the  parallelization  method  of  Valiant-Skyum-Berkowitz-Rackof f 

[81]  to  obtain  a  parallel  algorithm  for  the  determinant  using 
2 

O(log  n)  time  and  a  polynomial  number  of  processors.  Neither 
division  nor  branching  occurs. 

Obviously  INVERSION  and  NONSINGULAR  EQUATIONS  are  not  harder 

than  DETERMINANT,  but  they  require  a  division  step  at  the  end  of 

the  computation.  For  the  characteristic  polynomial,  we  execute 

the  sequential  division-free  determinant  algorithm  on  X-tl. 

Each  step  computes  a  polynomial  in  t,  and  we  split  the  step  into 
2 

s(n+l)  operations  in  F  lx11  >  X1 2 ’  ‘ '  ”Xnn^  by  comPutinB  the  co~ 
efficients  of  t^,t^,...,tn  separately.  Parallelization  applies 
again  to  yield  the  result.  □ 

Remark .  The  above  process  of  getting  rid  of  divisions  and 
converting  to  a  parallel  computation  of  cost  O(logn(logn+logt) ) 
applies  in  principle  tu  any  sequential  computation  that  computes 
a  polynomial  of  degree  <n  in  time  <t.  In  avoiding  divisions, 
one  has  to  shift  the  indeterminates  so  that  only  divisions  by 
non- zero  field  elements  occur  if  the  indeterminates  are  set  to 
zero.  Since  it  might  not  always  be  clear  how  to  pick  the 
constants  required  for  this  shift,  the  conversion  may  become  non- 
uniform.  This  is  particularly  evident  over  finite  fields,  where 
one  might  have  to  make  a  (finite  algebraic)  field  extension. 
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Theorem  2  For  any  field  F,  GCD  can  be  computed  in  parallel 
2 

time  O(log  n) . 

Proof  Let  f,g  e  F[x]  be  non- zero,  deg  f  =  m<n  =  deg  g.  If 
f  is  not  a  constant  multiple  of  g,  then 

deg  gcd(f,g)  =  min{ie(M:3s  ,teF  [x] ,  deg  s<n-i  and 
deg(sf+tg)=i} . 

The  latter  condition  translates  into  the  following 
(n+m- 2i) x (n+m- 2i) -system  of  linear  equations: 
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So  for  f,g  as  above,  the  following  algorithm  computes  gcd(f,g) 
in  parallel  time  O(log2n):( 

1.  Compute  in  parallel  an, . . .  ,a  -  where  a.  =  det  A.  and  A 

urn,  i  x 

is  the  coefficient  matrix  of  S^. 

2.  Set  d  -  min{i:  a^rO}. 

3.  Compute  a  solution  (s,t)  of  S  , .  (Note  that  SJ  is  a  non 

d  d 

singular  system.) 

4.  Compute  gcd(f,g)  =  sf+tg.  Q 
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It  would  be  important  to  have  a  similar  result  for  the  ged  of 
two  integers,  and  we  ask  the 

OPEN  QUESTION  1:  Is  INTEGER  GCD  e  NC? 

4 .  Rank  of  matrices 

For  algorithms  computing  the  rank  of  matrices,  i.e.  the 
maximal  size  of  nonsingular  j submatrices,  we  can  restrict  atten¬ 
tion  to  square  matrices  (by  padding  with  zeroes  if  necessary)  . 

The  rank  cannot  in  general  be  considered  as  an  element  of  the 
ground  field,  and  we  have  to  make  some  output  convention  such  as 
the  following:  we  require  the  algorithm  to  compute 

f_,...,f  e  F (x,  , x, x  )  such  that  for  any  A  e  M  (F) 

In  11*  12*  *  nn'  '  n 

rank  (A)  =  min{i:  f^Aj^O}. 

We  remark  that  this  convention  is  inadequate  over  <C,  since  then 

2 

the  n  equations  f^(A)  =  ...  =  fn(A)  =  0  in  n  variables  should 
have  only  A=0  as  solution.  If  n>l,  then  this  is  impossible  over 
any  algebraically  closed  field.  See  the  end  of  this  section  for 
a  collection  for  RANK. 

If  c (A)  =  det(A-tl)  =  l  c.(A)t1  e  F[t]  is  the 

0<isn 

characteristic  polynomial  of  A  t  Mn(F)  *  let  US  cal1 
rankaig(A)  =  max{i:  cn_i(A)^0} 

the  algebraic  rank  of  A.  The  relation  with  rank (A)  is  described  by 

rank(A)  =  n-dim(nullspace  of  A) 

=  n-dim(eigenspace  of  A  for  eigenvalue  0); 

rank  .  (A)  =  n-order  of  c(A) 
alg 

=  n-multip lici ty  of  eigenvalue  0  in  c(A). 
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We  thus  have  rankalg(A)  <  rank(A)  for  any  matrix  A. 

A  =  q)  is  an  example  where  ranka-^(A)  <  rank  (A)  . 

This  is,  however,  an  "exceptional  case":  within  the  set 

Ur  of  matrices  A  e  Mr(F)  with  rank(A)  s  r,  the  subset  Vr  of  matrices 

with  ranka^^(A)  <  r  is  described  by  the  nontrivial  polynomial 

condition  "c  (A)  =  0".  Thus,  "most"  A  in  U  will  not  be  in  V  . 

n-r  r  r 

E.g.  if  F  is  algebraically  closed,  then  V  has  dimension  strictly 

less  than  the  irreducible  variety  U  . 

r 

Now  it  follows  from  Theorem  1  that  rank  ,  CA)  can  be 

a  ig 

computed  fast  in  parallel,  and  hence  also  rank(A)  provided 
rank(A)  =  ranka^  (A) .  In  the  sequel  we  look  for  ways  of  obtain¬ 
ing  the  latter  condition. 

Ibarra-Moran-Rosier  [80]  have  the  following  result: 

If  F  is  a  subfield  of  IR,  then  rank(A)  =  rank(AtA)  =  rank-  (A^A)  . 

axg 

2 

Hence  RANK  can  be  computed  in  parallel  time  O(log  n)  for  such  a 
field.  They  also  show  that  this  is  true  if  F  is  a  subfield  of  C 
and  one  is  allowed  to  use  complex  conjugation  in  the  algorithm. 

The  rank  question  has  some  nice  applications  in  combina¬ 
torial  complexity.  Consider  a  bipartite  graph  G  on  2n  nodes. 

We  associate  to  it  an  n*n-matrix  X  over  fl^x^^x^, . .  .  ,x  ) 
which  has  x^  in  the  (i ,  j ) -posi  tion  if  node  i  is  connected  to 
node  j,  and  zero  otherwise.  Then  the  maximal  size  of  a  matching 
in  G  is  equal  to  rank(Xj  (Edmonds [6 7 ]) .  By  substituting  randomly 
chosen  integers  (from  a  fixed  finite  range,  see  Lemma  1  below  ) 
for  the  indeterminates  and  computing  the  rank  of  such  matrices 
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over  <Q,  we  get  a  Monte  Carlo  algorithm  to  determine  the  maximal 

size  of  matchings  in  G,  and  hence  a  (non-uniform)  algorithm  for 

this  problem  whose  parallel  time  is  O(log  n) .  It  would  be 

interesting  to  have  an  algorithm  of  this  type  that  actually 

constructs  a  maximal  matching. 

Next  consider  a  directed  graph  with  each  edge  being  as-  j 

signed  an  integer  capacity  that  is  polynomially  bounded  by  the  numbe 

of  nodes  of  the  graph.  Feather  [81]  reduces  the  problem  of  finding 

the  maximum  flow  in  such  a  graph  to  the  above  maximal  matching 

2 

problem,  and  thus  obtains  an  0(log  n)  algorithm. 

It  is  interesting  to  note  that  without  the  restriction 

on  the  capacities  the  problem  is  log  space  complete  for  P 

(Goldschlager-Shaw-Staples  [81]),  and  hence  we  do  not  expect  it 

to  have  a  solution  in  parallel  time  O(log^n)  for  some  k. 

The  rank  algorithm  by  Ibarra-Moran-Rosier  provides  a 

nice  solution  for  fields  like  Q  and  iR.  For  the  general  case,  in 

particular  for  finite  fields,  we  have  the  following  result. 

Theorem  3  For  any  field  F,  one  can  compute  the  rank  of  nxn- 

matrices  over  F  with  a  Monte  Carlo  algorithm  with  error 

2 

probability  <  3/4  using  parallel  time  0(log  n)  and  a  polynomially 
bounded  total  number  of  processors.  Neither  division  nor  branch¬ 
ing  occur. 

Proof  Let  F  be  a  field,  n  £  IN  and  A  e  M  (F)  .  We  assume  some 

4 

finite  subset  P  s  F  of  cardinality  p  such  that  either  p  £  ^n 
or  P  =  F.  Furthermore,  we  assume  a  "random  element  generator" 
for  P  that  produces  in  one  step  of  a  processor  a  randomly  chosen 
element  from  P  with  respect  to  the  uniform  distribution  on  P. 

(Thus  if  #F  s  ^n,  we  can  take  any  large  enough  subset 
for  P;  the  proof  will  be  slightly  easier  in  this  case.  Otherwise, 


we  can  either  take  P=F,  or  compute  in  parallel  time  O(logn) 
a  large  enough  finite  algebraic  extension  field  of  F  to  reduce 
to  the  first  case.) 

We  perform  the  following  algorithm  to  compute  rank  A: 

1.  Choose  a  random  B  e  M  (P) . 

n 

2.  Compute  s  =  rank^^CAB). 

It  is  clear  from  Theorem  1  that  this  algorithm  uses  parallel  time 
2 

Oflog  n)  ,  a  polynomially  bounded  number  of  processors,  and 
neither  division  nor  branching.  Also  s  s  rank(A).  The  remainder 
of  this  proof  is  devoted  to  establishing  the  estimate 

Prob ( {s=rank (A) } )  £  1/4. 

Let  N  =  {l,...,n),  reN,  m  =  (£)  and  (^)  =  (IcN:  #I=r}. 

For  any  n*n-matrix  M  and  I ,  J  e  (^)  we  write  Mjj  for  the  determinant 
of  the  IxJ-minor  of  M. 

The  idea  of  the  proof  is  the  follo\%’ing.  Rank(A)  >  r 
iff  the  mxm-array  of  all  minors  Ajj  has  a  nonzero  entry.  Of 
course  it  is  too  expensive  to  compute  this  whole  array,  but  a 
piece  of  information,  namely  the  sum  of  the  diagonal  entries, 
is  easy  to  compute.  Introducing  a  randomly  chosen  matrix,  it 
turns  out  that  one  has  a  good  chance  that  this  piece  of 
information  is  sufficient. 

A  convenient  tool  for  representing  the  array  is  the 
exterior  power  Ar(Fn)  (for  definitions,  see  e.g.  Greub  [78]) 
which  is  an  m-dimensional  vector  space  over  F  with  a  basis  indexed 
by  (y) .  We  have  a  natural  mapping 

Ar:  Mn(F)  -  End (Ar (Fn) )  =  Mm(F) 
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which  maps  a  matrix  B  to  the  m*m-array  of  its  rxr-minors, 

i.e.  (Ar(B) ) j j  =  Bjj  for  I ,  J  e  (^)  .  Ar  is  multiplicative 

(Lagrange's  identity,  see  Greub  [78]),  and  obviously  A  (id)  =  id 

For  any  B  e  M  (F)  the  coefficients  of  the  characteristic  poly- 
n 

nomial  can  be  expressed  as 

cn_r(B)  =  (-Dn“T  I  f  Bjj  =  (-l)n‘rtr(Ar(B)), 

where  tr  denotes  the  trace  (Greub  [7S],  Prop. 7. 5.1),  and  thus 

(-l)n"rcn_r(AB)  =  tr  (Ar (AB) )  =  tr(Ar (A)Ar (B) ) 

=  ^  N-  AJIBIJ» 

I ,Je(*) 

where  AeMn(F)  is  our  given  matrix.  Now  let  r  =  rank(A),and 
consider  the  polynomial 


hA  l  „  AJI^IJeB^ll,^l2>  ’  *  *  ,ynn^ 


l,Je(J) 


where  Y  is  the  n*n-matrix  whose  (i,j)-entry  is  the  indeterminate 
y^j .  Since  some  Ajj  is  nonzero  and  no  cancellation  occurs,  h^  is 
nonzero.  For  any  B£Mn(F)  we  have 

hA(B)  ^  0  =>  cn_r(AB)  t  0 

=>  r  £  rankalg(AB)  £  rank(AB)  s  rank(A)  =  r 
*=>  rankalg(AB)  =  r. 


and  it  is  sufficient  to  show  that 

-  n2 

f{B£Mn(P):hA(B)=0}  s  Jpn  . 
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In  the  case  p  &  ^n,  it  follows  from  Lemma  1  below  that 
#{BcMn(P):  h A ( B ) =  0 )  s  II  ^  <  |  pn\ 


Now  assume  that  F  is  finite  and  P=F.  Let 


D  - 


1  0 

o  lo-o 


£  Mn(F) 


with  rank (D)  =  r.  Then  hD  =  Y 
Lemma  2(ii)  below  1 


(1, . . .  ,r} , {1, . 


#(Bain(F):  hD(B)=0}  s  £pn  . 


Now  there  exist  U.VeGL  (F)  such  that  A  =  UDV. 

n 

we  have  i 


, ,  and  bv  the 
r) 


For  any  B  e  Mn(F) 


hA(B)  =  tr  (Ar (AB) )  =  tr (Ar (UDVB) ) , 

hD(B)  =  tr  (Ar  (DB) )  =  tr (Ar  (U) Ar (DB) Ar (U) " X) 

=  tr (Ar(UDBU"1))  . 


We  get  the  desired  estimate,  which  completes  the  proof  of 
Theorem  3: 

#{BeMn(F):  hA(B)=0} 

«  #{BeMn(F):  tr (Ar (UDVB) ) =0} 

“  “{BeM  (F) :  tr (Ar  (UDBU _1) ) =0} 
n  2 
-  #{BeMn(F):  hD(B)=0}  *  |rn  . 

The  second  equality  follows  from  the  fact  that  B  h-  V  BU  J 
bijection  from  Mn(F)  to  Mr(F)  .  0 


V 
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For  the  two  lemmas  that  establish  the  probabilistic 
estimates,  we  assume  the  following  set-up  which  is  slightly 
more  general  than  required:  A  field  F,  PsF  finite  with  p^2  elements, 
d,ncN,  indc terminates  > x22  » *  *  * » xnn  over  F»  anc*  an  nxn-matrix 

G  =  of  univariate  polynomials  gjj£F£xijl  of  degree  £d.  ( 

In  Lemma  1,  we  estimate  the  probability  that  a  linear  combina¬ 
tion  of  minors  of  G  is  zero  when  the  entries  are  randomly  chosen 
from  P. 

Lemma  1 


Let  f  =  ycTTGTT  f  0,  where  c..cF  and  the  sum  is  over 
L  IJ  IJ  ij 


all  I,J  £  {l,...,n}  with  #1  =  #J.  Then  the  probability  that  f 
vanishes  is  <nd/p . 


Proof  Let  q  =  p  n 


#(AeM  (P):  f(A)=0}  be  the  probability  that  f 
n 


vanishes.  We  show  that  q  £  nd/p  by  induction  on  n.  The  case  n=l 
being  obvious,  we  can  assume  that  n£2,  f  is  non-constant,  and  that 


x  occurs  in  f.  Thus 
nn 


f  =  g  h  +  a, 
&nn 


h  = 


(I^J)cIjGl\{n},J\{n} 
ne  InJ 


/  0, 


g  £  FCx  ]\F, 
nn  nn 


and  x  does  not  occur  in  a.  Then  h  satisfies  the  hypothesis 


nn 


for  Lemma  1  with  n  replaced  by  n-1,  and 


#{AeM  (P):  f(A)=0} 
n 

s  #{AcM  (P):  h (A) =0}+  #{AeM  CP)  : 
n  n 

h(A)  f  0  and  gnn(A)  =  -a(A)hCA)"1} 
s  (nl)d  pn2_1  +  dpn  _1  =  ndpn  _1, 
hence  q  s  nd/p. 


0 
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A  similar  inductive  argument  shows  that  for  an  arbitrary  poly¬ 
nomial  f  in  m  indeterminates  and  of  degree  sd  in  each  in¬ 
determinate  the  probability  of  vanishing  is  <md/p.  Applying 
this  general  observation  to  the  above  situation  would  yield  the 
2 

estimate  q  <  n  d/p. 

Lemma  2  gives  a  sharp  estimate  of  the  probability  that  a 

matrix  of  polynomials  is  singular,  he  introduce  the  function 

u(t,n)  =  1-  n  (1-t1)  for  0<t<l  and  nfN.  u  is  monotonely 
l<isn 

increasing  in  both  arguments. 


Lemma  2  In  the  set-up  described  before  Lemma  1,  assume  that 
no  is  constant,  and  let  q  be  the  probability  that  det(G) 
is  zero.  Then 


(i)  q  <  u(d/p,n) . 

(ii)  if  d=l,  then  q  <  3/4. 

(iii)  if  d=l  and  P  is  a  field,  then  q  =  u(l/p,n) . 

Proof  Let  t  =  d/p. 

(i)  We  have  to  show  that 
2 

q  =  p  #{AeMn(P):  det(g(A))=0}  <  u(t,n), 

where  g(A)  is  the  matrix  with  (i,j) -entry  For  lsr<n, 

let  T  be  the  set  of  all  rxn-matrices  with  entries  from  P, 
r 

and  Sr  =  {AeTr:  rank (g(A) ) =r} .  Now  let  A  e  Sr  ^  and  I  c  {l,...,n 
such  that  #1  =  r-1  and  A^  r  lj  j  /  0.  A  vector  y  e  Fn  which 
is  linearly  dependent  on  the  row  vectors  of  g(A)  is  determined  by 
its  entries  y^  for  iel.  Hence  there  are  at  most  pr~*dn  r+* 
vectors  zeT^  such  that  g(z)  is  linearly  dependent  on  the  rows  of 
g (A) .  Thus 


Figure.  The  upper  bound  xi(t,n)  on  the  probability  that  an 
nXn-matrix  is  singular.  The  entries  of  the  matrix  are  non¬ 
constant  polynomials  of  degree  ^d  over  an  arbitrary  field  F 
whose  arguments  arc  randomly  chosen  from  a  subset  of  F  with 
p  elements,  and  t-d/p.  The  values  n=l,2,3,4  are  shown,  and 
n=oo  shows  an  upper  bound  for  any  nel\T. 


•****• 
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#sr  ^  (pn-Pr-1dn-r+1)  #sr_1 

n,.  *n-r+l.  uc 

=  P  (1‘t  ) 

and  by  induction  on  r  wc  get 

#S  2:  prn  n  (1-t1). 
r  n-r<i<n 

Thus 

2  2 

q  ■  p  • (p  -*sn) 

s  1  -  n  (1-t1)  =  u(t,n) . 
lsisn 

(ii)  For  Ost<l,  let 

v(t)  =  lim  u(t,n)  =  1-11  (1-t1). 
n->“  lsi 

The  product  in  v  is  uniformly  convergent  on  every  interval  [0,a] 

with  a<l  (see  Henrici  1771,  §8.2  for  a  discussion  and  the  relation 

of  v  with  combinatorial  questions),  and  v  is  a  monotonely 

increasing  function.  In  particular,  for  0<t<l/2  and  n  elN  we  have 

0  s  u(t,n)  <  v ( t)  s  v(l/2)  =  0.7112. . .<3/4. 

This  proves  (ii),  since  then  t  =  1/p  s  1/2. 

(iii)  Under  the  hypotheses  of  (iii),  one  can  replace 
r-1  n-r+1  r-*! 

"at  most  p  d  "by  "exactly  p  "  in  the  argument  for  (i), 
and 

#sr  =  Prn  n  (l-t1) 

n-r<isn 

follows.  This  implies  q  =  u(t,n).  0 

Statement  (iii)  shows  that  the  estimate  in  (i)  is  sharp. 

The  Figure  shows  the  increasing  functions  u(t,n)  for  lsns4,  and 
their  limiting  curve  v(t)  =  u(t,~).  We  remark  that  the  estimate 
of  (i)  also  holds  when  the  determinant  is  replaced  by  the  permanent; 
the  proof  is  slightly  different. 
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When  interpreted  as  the  decision  problem  "is  A  in 

U  ~  {BeM  (F) :  rank (B) <r} the  above  Monte  Carlo  algorithm 
r  n 

always  answers  correctly  if  AeU^.,  and  correctly  with  probability 
2  1/4  if  A4U^.  We  can  improve  this  behaviour  to  get  an  algorithm 
that  always  answers  correctly  and  with  high  probability  has  a 
low  parallel  running  time. 
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Theorem  4.  For  any  field  F,  there  exists  a  Monte  Carlo 
algorithm  for  the  rank  of  nxn-matr ices  that  uses  parallel  time 
O(log2n),  no  divisions  and  a  polynomial  number  of  processors 
and  that  either  returns  the  rank  of  the  input  matrix  (and  a 
maximal  nonsingular  minor)  or  reports  that  it  failed.  The 
probability  of  the  second  case  is  s2  n. 

Proof.  Let  AfM  (F) .  For  l<isn  let  A.eM  (F) 

consist  of  the  first  i  rows  of  A  and  zero  rows  otherwise,  and 

=  rank(A^).  Apply  the  algorithm  of  Theorem  3  independently 

6n  times  in  parallel  to  each  A^ ,  and  let  s^  be  the  maximum  of 

the  numbers  computed  for  A..  Thus  s.<r.,  and  for  each  i 

l  ii 

Probas^r.})  <  (3/4) 6n. 

Let  I  *  ^i:  lsisn  and  sjL.i<s^J  (with  Sp=0),  and  A  *  cM^(F)  be  the 
matrix  whose  i-th  row  is  the  i-th  row  of  A  if  iel  and  zero  other¬ 
wise.  Perform  the  computation  that  was  done  above  for  the  rows 
of  A  now  for  the  columns  of  A*  to  obtain  a  set  J  c  {l,...,n>. 

The  parallel  time  for  all  this  is  O(log2n),  and 


Prob({the  IxJ-minor  of  A  is  not  a  maximal  (square) 

nonsingular  minor})  s  2n(4-)^n  s  2  n. 

4 

If  #1  /  #3  then  report  failure.  Else  apply  the  determinant 

algorithm  of  Theorem  1  to  compute  A.  T  and  each  A.,r  with  l£K, 

1 J  KL 

JcL,  =  |L  =  fl+1  in  parallel  time  O(log2n) .  If  either  Ajj=0  or 
some  such  A^  is  nonzero,  then  report  failure.  Otherwise  return 
sn  =  rank(A)  and  the  IxJ-minor  as  a  maximal  nonsingular  minor.  □ 


It  is  now  clear  what  is  an  appropriate  collection  (f,S)  for 

RANK(n).  For  (Kr&n,  we  have 

2 

=  (AeFn  :  rank(A)=r} 

2 

and  f:  Sr  ->  F  maps  A  to  its  "top  left"  nonsingular  r*r-minor 

which  is  defined  along  the  lines  of  the  above  proof  (identifying 
2 

Fn  and  Mn(F)  in  a  natural  way) . 

Thus  the  condition  is  that  Ajj  t  0,  for  any  k<iel  the 
rows  with  index  (I\{i})u{k)  of  A  have  rank  <r,  and  for  any  k<jeJ 
the  columns  with  index  (J\{j})u{k)  of  the  I-rows  of  A  have 
rank  <r.  If  one  wants  f  to  consist  of  rational  functions  on 
each  set  of  the  partition,  one  has  to  further  split  up  Sr 
according  to  the  value  of  (I,J). 

The  rank  algorithm  over  R  of  Ibarra-Moran-Rosier  can 
be  interpreted  as  avoiding  the  Monte  Carlo  aspect  by  noting  that 
for  B=At 


hAW  =  T  *  °- 

(r) 

Using  their  algorithm  and  the  above  construction,  we  get  a 

2 

deterministic  algorithm  that  uses  parallel  time  O(log  n)  and 
computes  a  maximal  nonsingular  minor. 

Can  we  have  a  similar  improvement  for  arbitrary  fields?  Of 

course  our  Monte  Carlo  algorithm  yields  non-uniform  algorithms 

2 

using  parallel  time  0(log  n) .  It  would  be  particularly  interest¬ 
ing  to  have  an  answer  to  the  following 

2 

OPEN  QUESTION  2:  Can  RANK  be  computed  in  parallel  time  O(log  n) 
over  finite  fields,  using  a  uniform  family  of  deterministic 
algorithms? 


5 .  Reductions 

We  have  considered  a  number  of  algebraic  problems 

2 

whose  parallel  complexity  is  now  known  to  be  O(log  n)  .  Whether 
or  not  the  parallel  complexity  for  any  of  these  (and  some  closely 
related)  problems  can  be  reduced  to  O(log  n)  remains  a  funda¬ 
mental  challenge  for  all  of  complexity  theory  (see  for  example 
Borodin  [82]  and  Valiant  [82]).  It  is  natural  then  to  study 
the  relative  complexity  of  these  problems. 

Valiant  [82]  introduced  the  notion  of  algebraic  projections 
as  a  strong  type  of  reducibility  between  polynomials.  For 
collections  P  =  (P(n))  and  P'  =  (P'(n)),  we  write  P  <  P'  to 
informally  denote  the  fact  that  P  can  be  reduced  to  P'  essentially 
by  multiple  use  of  projections  and  possibly  some  simple  (i.e. 
of  O(log  n)  parallel  complexity)  arithmetic  transformations. 

A  precise  definition  for  <  will  not  be  developed  here. 

Given  any  reasonable  definition  of  reducibility  P  <  P',  and 
certainly  for  the  specific  reductions  given  in  this  section,  it 
follows  that  T(P)  and  T(P')  satisfy  T(P)  =  0(T(P')),  using 
the  fact  that  for  all  interesting  collections  P,  log  n  =  0(T(P(n))). 
We  write  P  ~  P'  to  denote  P  £  P*  and  P'  <  P,  and  we  write  P  P'  +  P 
to  denote  that  both  P'  and  P"  are  used  in  the  reduction. 

Csanky  [76]  has  the  following  reductions: 
a)  Over  any  field, 

INVERSION  *  NONSINGULAR  EQUATIONS 

s  DETERMINANT  s  CHARACTERISTIC  POLYNOMIAL. 
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b)  The  characteristic  polynomial  can  be  computed  by 
evaluating  it  at  several  points  (using  a  determinant 
algorithm)  and  interpolating.  If  the  field  contains 
the  necessary  roots  of  unity  to  support  a  Fast  Fourier 
Transform,  then  the  interpolation  can  be  performed  in 
O(log  n) ,  using  the  roots  of  unity  as  interpolation 
points.  We  then  have 

CHARACTERISTIC  POLYNOMIAL  <  DETERMINANT. 

This  holds  non-uniformly  over  arbitrary  fields,  by 
extending  the  field  by  the  necessary  roots  of  unity. 

c)  If  the  field  has  characteristic  zero,  then 
DETERMINANT  <  NONSINGULAR  EQUATIONS. 

Thus,  over  C  all  four  problems  are  equivalent. 

Next,  we  consider  the  problems  BASIS  (  =  computing  a 
maximal  linearly  independent  subset  of  a  given  set  of  vectors), 
EQUATIONS  (=  deciding  whether  a  (possibly  singular)  system  of 
linear  equations  has  a  solution,  and  in  the  affirmative  case, 
computing  one  solution)  and  NULLSPACE  (=  computing  a  basis  for 
the  nullspace  of  a  matrix) . 

For  any  field,  we  have  the  following  reductions: 

a)  BASIS  2  RANK  5  NULLSPACE  <  RANK  *  INVERSION, 

b)  EQUATIONS  £  RANK  +  INVERSION. 

Proof  a)  We  note  that  the  rank  is  implied  directly  by  BASIS  or 
NULLSPACE.  The  reduction  BASIS  £  RANK  is  obtained  by  including 
v^  in  the  basis  iff  rankfVj, . . . ,v^_ <  rank(Vj , . . . , v^)  . 
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For  the  remaining  reduction,  we  can  compute  a  maximal  nonsingular 
minor  M  for  an  input  matrix  A,  by  applying  BASIS  (say)  first 
to  the  columns  of  A  and  then  to  the  rows  of  the  selected 
columns.  Without  loss  of  generality  let  M  be  the  upper  r*r- 
submatrix  of  A.  For  each  i  satisfying  r+l<i<n,  we  can  solve  the 
nonsingular  systems  Mx^  =  y^ ,  when  y^  denotes  the  first  r  rows 
of  the  i-th  column  of  A.  A  basis  for  the  nullspace  of  A  then 

x* 

consists  of  the  vectors  .  ,  r<i<n,  that  is,  is  extended  by 

-1 

: 

'  0  >• 

0’s  except  for  a  -1  in  the  i-th  position. 

b)  For  EQUATION'S,  we  again  compute  a  maximal  nonsingular 
minor  M  of  the  input  matrix,  solve  the  corresponding  nonsingular 
system  of  linear  equations,  set  all  indeterminates  not  corres¬ 
ponding  to  columns  of  M  equal  to  zero,  and  check  i\’hether  this 
constitutes  a  solution.  If  it  does  not,  then  the  input  system 
has  no  solution  at  all.  □ 

Allowing  non-uniform  reductions,  and  allowing  the  concept 
of  reducibility  to  use  linear  transformations,  it  follows  (by 
Theorem  3  and  Csanky's  reductions)  that  RANK  <  CHARACTERISTIC 
POLYNOMIAL  <  DETERMINANT  and  hence  EQUATIONS  <  DETERMINANT. 

We  also  know  (from  Theorem  2)  that  GCD  <  DETERMINANT. 

Wc  are  left  with  a  number  of  potential  reductions  which 
arc  either  completely  unresolved  or  hold  only  in  the  non- 
uniform  case.  In  particular,  we  ask  the  following: 


Open  Question  3:  a)  Are  any  of  the  above  problems  <  GCD? 

b)  Is  DETERMINANT  <  NONSINGULAR  EQUATIONS 
for  arbitrary  fields? 

The  latter  question  is  also  open  in  the  sequential  setting  (see 
Baur-Strassen  [82]). 

6 .  Conclusion 

We  have  laid  the  foundation  for  what  might  be  called  a 

"theoretical  package  for  parallel  symbolic  manipulation". 

The  problems  investigated  include  ged  of  polynomials,  solution 

of  linear  equations,  determinant  and  rank  of  matrices.  They 

2 

all  can  be  solved  in  parallel  time  O(log  (input  size));  for 
rank  a  Monte  Carlo  method  is  used. 

Further  important  routines  for  this  "theoretical  package" 
include  factoring  polynomials  over  finite  fields  (which  provided 
the  original  motivation  for  this  paper;  consider  the  critical 
role  of  the  GCD  and  NULLSPACE  in  Berlekamp’s  [70]  factoring 
algorithm),  continued  fraction  and  partial  fraction  expansion 
of  rational  functions,  Pade  approximation  of  power  series,  and 
interpolation.  They  will  be  discussed  in  a  forthcoming  paper 
(von  zur  Gathen  [82]). 

Finally,  as  in  sequential  computation,  we  remark  that 

integer  problems  are  often  more  difficult  to  understand  than  the 

corresponding  polynomial  problem.  For  example,  it  remains  an 

2 

open  problem  as  to  whether  or  not  a  fast  (e.g.  O(log  n))  parallel 
algorithm  exists  for  the  ged  of  two  n-bit  numbers.  An  even  more 
challenging  open  problem  concerns  the  parallel  complexity  of 
determining  primality. 
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